PAW: A Platform for Analytics Workflows

نویسندگان

  • Maxim Filatov
  • Verena Kantere
چکیده

Big Data analytics in science and industry are performed on a range of heterogeneous data stores, both traditional and modern, and on a diversity of query engines. Workflows are difficult to design and implement since they span a variety of systems. To reduce development time and processing costs, automation is needed. We present PAW, a platform to manage analytics workflows. PAW enables workflow design, execution, analysis and optimization with respect to time efficiency, over multiple execution engines, namely a DBMS, a MapReduce engine, and an orchestration engine. This configuration is emerging as a common paradigm used to combine analysis of unstructured data with analysis of structured data (e.g., NoSQL plus SQL). The demonstration of PAW focuses on the usability of the platform by users with various expertise, the automation of the analysis and optimization of execution, as well as the effect of optimization on workflow execution. The demonstration scenarios are based on synthetic and real workflows on real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Optimizing, Planning and Executing Analytics Workflows over Multiple Engines

Big data analytics have become a necessity to businesses worldwide. The complexity of the tasks they execute is ever increasing due to the surge in data and task heterogeneity. Current analytics platforms, while successful in harnessing multiple aspects of this “data deluge”, bind their efficacy to a single data and compute model and often depend on proprietary systems. However, no single execu...

متن کامل

From the Desktop to the Grid and Cloud: Conversion of KNIME Workflows to WS-PGRADE

Computational analyses for research usually consist of a complicated orchestration of data flows, software libraries, visualization, selection of adequate parameters, etc. Structuring these complex activities into a collaboration of simple, reproducible and well defined tasks brings down complexity and increases reproducibility. This is the basic notion of workflows. Workflow engines allow user...

متن کامل

Analyzing usage: Visualizing end-user workflows to drive product development

This article illustrates the importance of harvesting usage analytics to improve user experience on a platform. In the shift from print to digital, user engagement has replaced content as the key contributor of growth. Visualizing end-user workflows through a content platform allows organizations to understand what features contribute to higher engagement or just clutter the interface and negat...

متن کامل

SmartR: an open-source platform for interactive visual analytics for translational research data

Summary In translational research, efficient knowledge exchange between the different fields of expertise is crucial. An open platform that is capable of storing a multitude of data types such as clinical, pre-clinical or OMICS data combined with strong visual analytical capabilities will significantly accelerate the scientific progress by making data more accessible and hypothesis generation e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016